One of the most distinctive design of cpprb is column-oriented flexibly defined transitions. As far as we know, other replay buffer implementations adopt row-oriented flexible transitions (aka. array of transition class) or column-oriented non-flexible transitions.
In deep reinforcement learning, sampled batch is divided into variables (i.e. obs
, act
, etc.). If the sampled batch is row-oriented, users (or library) need to convert it into column-oriented one. (See doc, too)
cpprb can accept addition of multiple transitions simultaneously. This design is convenient when batch transitions are moved from local buffers to a global buffer. Moreover it is more efficient because of not only removing pure-Python for
loop but also suppressing unnecessary priority updates for PER. (See doc, too)
We try to minimize dependency. Only NumPy is required during its execution. Small dependency is always preferable to avoid dependency hell.